Transactional Support in MapReduce for Speculative Parallelism
نویسندگان
چکیده
MapReduce has emerged as a popular programming model for large-scale distributed computing. Its framework enforces strict synchronization between successive map and reduce phases and limited data-sharing within a phase. Use of keyvalue based persistent storage with MapReduce presents intriguing opportunities and challenges. These challenges relate primarily to semantic inconsistencies arising from the different fault-tolerant mechanisms employed by the execution environment and the underlying storage medium. We define formal transactional semantics for MapReduce over reliable key-value stores. With minimal performance overhead and no increase in program complexity, our solutions support broad classes of distributed applications hitherto infeasible in MapReduce. Specifically, this paper (i) motivates the use of key-value stores as the underlying storage for MapReduce, (ii) defines transactional semantics for MapReduce to address any inconsistencies, (iii) demonstrates broader application scope enabled by data sharing within and across jobs, and (iv) presents a detailed evaluation demonstrating the low overhead of our proposed semantics.
منابع مشابه
TransMR: Data-Centric Programming Beyond Data Parallelism
MapReduce and related data-centric programming models have proven to be effective for a variety of large-scale distributed computations, in particular, those that manifest data parallelism. The fault-tolerance model underlying these programming environments relies on deterministic replay, which makes data-sharing (side-effects) across computations harder to support. This significantly limits th...
متن کاملExploring Speculative Parallelism in Spec2006 Exploring Speculative Parallelism in Spec2006 Exploring Speculative Parallelism in Spec2006
Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000’s. It was hoped that the continuous improvement of single-program performance could be achieved through these architectures. However, traditional parallelizing compilers often fail to effectively parallelize general-purpose applications which typically have complex control ...
متن کاملImproving Continuation-Powered Method-Level Speculation for JVM Applications
Most applications running on the Java Virtual Machine (JVM) make extensive use of dynamic object-oriented programming features such as inheritance, polymorphism, and encapsulation. This makes them very hard or even impossible to analyze statically, defeating most of the automatic parallelization research done so far for traditional computeheavy scientific applications. In this paper, we propose...
متن کاملAutomatic Tuning of the Parallelism Degree in Hardware Transactional Memory
Transactional Memory (TM) is an emerging paradigm that promises to ease the development of parallel applications. Due to its inherently speculative nature, however, TM can suffer of performance degradations in presence of conflict intensive workloads. A key technique to tackle this issue consists in dynamically regulating the number of concurrent threads, which allows for selecting the concurre...
متن کاملSpeculative Concurrent Processing with Transactional Memory in the Actor Model
The actor model has been successfully used for scalable computing in distributed systems. Actors are objects with a local state, which can only be modified by the exchange of messages. One of the fundamental principles of actor models is to guarantee sequential message processing, which avoids typical concurrency hazards, but limits the achievable message throughput. Preserving the sequential s...
متن کامل